Learning to Rank in the era of transformers

With the advent of advances in NLP, Ranking of search results is increasingly a NLP problem. This post will give an insight into basics as well as the most recent advances.

Nikhil Kasukurthi https://example.com/norajones (Udaan)https://udaan.com
2021-06-01

Problem at hand

Throughout this post, we will be discussing this from the lens of e-commerce website, building a search engine for the user to search the product.

In the figure, you can see it clearly, a user has a rough idea of the products they are looking for. This problem can be modeled in the following way.

Introduction

Figure 1: Introduction

Dataset

The dataset for this type of problems is typically collected by mining the logs of user-clicks. A product that the user clicks is considered a positive and ranked higher. Other methods use a human annotator for ranking the relevance of a set of documents given a query.

Evaluation criteria for Ranking

Before we dive into how to solve the problem, establishing the criteria to evaluate it is important. In this case, the relevance of ranked documents to the user is the primary criteria. There are multiple methods of doing it, the usual precision, recall and f-score are some methods. For IR specifically, Discounted Cumulative Gains (DCG), it’s normalized form NDCG, Mean Rank Reciprocal are used. We will be discussing them further here.

Discounted Cumulative Gains (DCG)

Mean Rank Reciprocal (MRR)

Learning to Rank - 101

The basis of ranking is rooted in three basic approaches of loss calculation. Namely Pointwise, Pairwise and Listwise ranking.

Formalizing the problem, we have a set of queries \(Q=\{q_1,q_2....q_n\}\), for which we need to retrieve a set of relevant documents \(D=\{d_1, d_2,...d_k\}\); Each of the above approaches treats this problem differently.

Pointwise Ranking

In this setting, each document with respect to the query is treated independently. Given a query, all the documents are given a score of relevance.

Pairwise Ranking

Two documents are taken in conjunction and their precedence is predicted.

Listwise Ranking

The above two approaches do not take into account the inter-document relationship, its assumed that all the documents are independent.

Listwise approaches are the closest to modeling the metrics that we had defined earlier.

Traditional Ranking appraoches

Query depedent methods typically use TF-IDF [cite this] or another popular LR related approach is BM25 1 [cite this].

Here we discuss some of the seminal works.

RankNet and LambdaRank

These are pairwise approaches. In RankNet, the query \(q_i\) and document \(d_k\) features are concatenated and fed through the model. The output for a pair of query-document \(f(q_i,d_1)\) and \(f(q_i,d_2)\) is compared. The network gives a relevance score to each document and the document with the higher score is given precedence. Once it’s done for all pair of documents, they are sorted in their order of relevance. The model is trained with a cross-entropy loss. Drawback of RankNet is that the model optimizes for only document pairs but our metric to evaluate DCG, evaluates the relative ranking of documents. The penalty for incorrectly ranking the first two documents will be higher than that of last two documents.

LambdaRank is an extension of RankNet, mitigating the optimization for DCG metric by adding a \(delta\) of the NDCG score if the

LambdaMART

Through gradient boosted trees, LambdaMart looks at the entire set of documents i.e., it’s a listwise ranking. They are branched based on their features and the results are aggregated from all the trees to give a final relevance prediction for each document.

The progression of has been from RankNet then to LambdaRank and to LambdaMART

Deep nerual network approaches

Deep Structured Semantic Model (DSSM) for websearch using clickthrough data.

In case of RankNet, the relevance scores of each query-document is computed and ranked accordingly. But with DSSM, the embeddings for bag-of-words vectors of query and documents are computed independently but from the same model. Then the cosine similarity with the query embedding and document is calculated for the semantic relevance score. The documents are sorted based on this score.

DSSM Overview

Figure 2: DSSM Overview

To train the network, a posterior probability is computed for the document given a query from the semantic relevance score between them through a softmax function. Random negative samples are included in a minibatch i.e., a completely irrelevant document and a relevant document are sampled and trained with a cross-entropy loss.

DSSM also uses word-hashing to tackle the large dimensionaltiy introduced when working with a web-scale vocabulary by using an MLP layer as shown in figure.

DSSM Architecture

Figure 3: DSSM Architecture

DRRM

TODO

DeText

DeText is a flexible pipeline that was built to rank LinkedIn’s user profiles, jobs and help FAQs based on search queries.

DeText Architecture

Figure 4: DeText Architecture

DeText Inference Pipeline

Figure 5: DeText Inference Pipeline

The amazon search ranking incorporates a Graph Neural Network in tandem with a transformer encoder. The transformer is trained with both the query and documents.

Training

They train a DistillBERT with 6 layers and 768 hidden units.

Architecture

Figure 6: Architecture

Inference

During inference, similar to DSSM, they compute cosine similarity between the query embedding and document (product) embeddings. Then finding K-Nearest Neighbors of the query. (check this)

Acknowledgments


  1. 25 because it was the 25th formula that had worked for them.↩︎

Citation

For attribution, please cite this work as

Kasukurthi (2021, June 1). Nikhil Kasukurthi: Learning to Rank in the era of transformers. Retrieved from https://beta.rstudioconnect.com/content/11424/posts/ir_nls/

BibTeX citation

@misc{kasukurthi2021learning,
  author = {Kasukurthi, Nikhil},
  title = {Nikhil Kasukurthi: Learning to Rank in the era of transformers},
  url = {https://beta.rstudioconnect.com/content/11424/posts/ir_nls/},
  year = {2021}
}